Conversation
Fix parseEndpointList to handle single and double quotes that are treated as literal characters on Windows. - Strip surrounding quotes before parsing comma-separated endpoints - Fixes test failures on Windows CI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
RPC health checking now runs continuously from process creation until proxy shutdown, completely independent of whether the model is loaded, starting, stopped, or in any other state. - Start health checker in NewProcess when rpcHealthCheck is enabled - Remove stopRPCHealthChecker - only stops on proxy shutdown - Remove state checks from health checker goroutine - Health status always reflects current RPC endpoint availability Previously, the health checker only ran while a process was in StateReady, causing stale health data when processes stopped. Now /v1/models always shows accurate RPC health regardless of model state. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Work in progress on web configuration feature. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The requestTimeout feature was not working because the timeout context was not connected to the HTTP request. When the timeout fired, it attempted to kill the process but the reverse proxy continued waiting for a response indefinitely. - Use context.WithTimeout() to create a timeout context for the HTTP request - Clone the request with the timeout context before proxying - When timeout fires, the HTTP request is immediately cancelled - Fix StopImmediately() to handle timeouts during model loading (StateStarting) - Add unit test to verify timeout behavior Before: requests would run for 60+ seconds despite requestTimeout: 20 After: requests terminate in exactly 20 seconds as configured Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Feat web config
Feat timeout
…thcheck Feat conditional rpc healthcheck
Adjust RPC health check parameters to reduce false positives when endpoints are under load and fix multiple security/correctness issues. - increase RPC health check timeout from 500ms to 3s to handle busy servers - decrease check interval from 30s to 10s for faster detection - fix process_timeout_test missing context parameter - fix config_embed to return cloned byte slice preventing mutation - update ui dependencies: js-yaml 4.1.1, @codemirror/state 6.5.4, @codemirror/lang-yaml 6.1.2 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix editor cleanup and improve dark mode appearance with better colors, contrast, and styling. - Add proper editor disposal in $effect cleanup - Update theme colors for better dark mode visibility - Improve button styling with teal export button - Better text contrast and subtle borders - Refine error message styling Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Prevent stopCommand() from hanging when a timeout transitions StateStarting to StateStopping before cmd.Start() completes. - Close cmdWaitChan when cmd.Start() fails - Add guard in stopCommand() to skip wait if process never started - Add tests for hang scenarios - Fix TestProcess_RequestTimeout to avoid calling t.Fatal from handler goroutine Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix ParseRPCEndpoints to handle Windows shlex behavior where single-quoted strings with spaces are split into multiple arguments. - Collect all non-flag arguments after --rpc flag - Join them with space before parsing endpoint list - Fixes test failure: TestParseRPCEndpoints_ValidFormats/multiple_endpoints_with_spaces_trimmed Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix data race between Process.start() and Process.stopCommand() by using a cmdStarted flag instead of checking cmd.Process directly. Also fix hang when StopImmediately() is called during startup. Changes: - Add cmdStarted bool flag to track if Start() completed successfully - Initialize cancelUpstream and cmdWaitChan before Start() so stopCommand() can access them during startup - Always call cancelUpstream() in stopCommand() to cancel context, even if Start() hasn't completed - Only wait on cmdWaitChan if cmdStarted is true to avoid hanging - Reset cmdStarted=false when process exits - Change cancelUpstream==nil error to debug message (expected in tests using forceState()) - Use platform-appropriate sleep command in tests (timeout on Windows, sleep on Unix) Fixes race condition and Windows test failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix theme compartment sharing bug and improve error response handling. - create separate Compartment instances for each CodeMirror editor - update createEditor to accept compartment parameter - improve saveConfig error handling to parse both JSON and non-JSON responses - include status code and statusText in error messages
Fix race condition in StopImmediately where state changes between CurrentState() and swapState() could cause stop to abort. Add retry loop to handle ErrExpectedStateMismatch. Fix test race condition where assert.Error was called inside goroutine. Move error assertion to main test goroutine using buffered channel. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix getSleepCommand to use full path to timeout.exe on Windows. This avoids conflict with GNU coreutils timeout command in Git Bash environments on CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separate request context from timeout monitoring context to avoid misattributing parent-imposed deadlines. Create monitoring context from context.Background() to reliably detect configured timeout, while maintaining request timeout for proper cancellation. - requestCtx: with timeout for request cancellation - timeoutCtx: from Background() for monitoring only - prevents false positives from parent context deadlines Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Feat web config
Feat timeout
…thcheck Feat conditional rpc healthcheck
I/O timeout errors now don't mark RPC endpoints as unhealthy. Timeouts are logged at debug level and health state is preserved. - detect timeout errors using net.Error.Timeout() - add test for timeout handling behavior Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…thcheck proxy: ignore I/O timeout in RPC health checks
Fix cursor jumping to top after typing by preventing reactive effect from re-running on content changes. Use untrack() to read config state without creating reactive dependency, ensuring editor is only created once and not destroyed/recreated on each keystroke. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
ui-svelte: fix Config editor cursor jumping on input
merge upstream
Feat timeout
Resolved conflicts: - Merged both Playground and Config features in App.svelte - Combined dependencies from both branches in package.json - Removed old ui/ files (replaced by ui-svelte/) - Regenerated package-lock.json with merged dependencies
Merge latest upstream changes including: - global TTL feature - setParamsByID filter - request/response capturing - playground UI (chat, images, speech, transcription, rerank) - incremental streaming markdown - infill timings support - in-flight request tracking - smart auto-scroll in LogPanel - copy button for code blocks - OGG audio format support - cuda13 architecture support Conflict resolutions: - proxy/process_test.go: kept context.Background() param + upstream var rename - proxy/proxymanager_api.go: kept both config editor and captures APIs - ui-svelte/package.json: kept both codemirror and markdown/katex deps - ui-svelte/src/App.svelte: kept both Config and Playground routes - ui-svelte/src/components/Header.svelte: fixed $location to $currentRoute - ui/package*: accepted upstream deletion of old React UI
Merge upstream changes (globalTTL, captures, playground, streaming markdown, rerank, infill timings, in-flight tracking, setParamsByID) with fork features (config editor, timeout handling, RPC health checks, context parameter). - resolve App.svelte conflict: keep Config + PlaygroundStub routes
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (22)
WalkthroughThe pull request introduces request timeout enforcement for model inference, adds RPC health checking for distributed inference models with TCP endpoint monitoring, implements configuration API endpoints, replaces the React UI with a new Svelte-based UI with configuration editor, updates the Makefile to build the Svelte UI, and refactors configuration hot-reloading in the main application. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR merges upstream changes that add a web-based configuration editor to the Svelte UI, plus new backend support for reading/writing the running config file, and introduces two new model-level operational controls: RPC endpoint health gating and per-request timeouts.
Changes:
- Add
/configUI route with a CodeMirror YAML editor (current config editable + embedded example config read-only), including import/export and client-side YAML validation. - Add authenticated backend API endpoints to fetch the current config file, fetch the embedded example config, and update the config (triggering a reload event).
- Introduce
rpcHealthCheck(filter unhealthy RPC models from/v1/modelsand return 503 on inference) andrequestTimeout(kill model process if a request exceeds a configured duration), with accompanying schema/docs/tests.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| ui-svelte/src/routes/Config.svelte | New configuration editor page (CodeMirror + YAML validation + import/export/save). |
| ui-svelte/src/components/Header.svelte | Add navigation link to the new /config route. |
| ui-svelte/src/App.svelte | Register /config route in the SPA router. |
| ui-svelte/package.json | Add CodeMirror + js-yaml dependencies needed for the config editor. |
| ui-svelte/package-lock.json | Lockfile updates for the new UI dependencies. |
| README.md | Update feature list to mention request timeout protection and RPC health checking. |
| proxy/proxymanager.go | Pass shutdown context into process groups; filter /v1/models based on RPC health; add setters for config path/example. |
| proxy/proxymanager_api.go | Add /api/config/* endpoints to read/write config and trigger reload events. |
| proxy/processgroup.go | Thread shutdown context through to each Process instance. |
| proxy/processgroup_test.go | Update tests for new NewProcessGroup(..., shutdownCtx) signature. |
| proxy/process.go | Implement requestTimeout enforcement and background RPC endpoint health checking; improve stop/start race handling. |
| proxy/process_timeout_test.go | Add tests around request timeout behavior. |
| proxy/process_test.go | Update tests for new NewProcess(..., shutdownCtx) signature and add stop/start edge-case tests. |
| proxy/process_rpc_health_test.go | Add tests for RPC health checking behavior. |
| proxy/config/model_config.go | Add rpcHealthCheck and requestTimeout fields to model config with defaults. |
| proxy/config/config.go | Add ParseRPCEndpoints helper for extracting --rpc endpoints from model commands. |
| proxy/config/config_test.go | Add unit tests for ParseRPCEndpoints. |
| Makefile | Update UI build/install targets to use the Svelte UI directory. |
| llama-swap.go | Persist config path/example into each newly created ProxyManager; always listen for config-change events to reload. |
| docs/configuration.md | Document new config options/features (including requestTimeout and rpcHealthCheck). |
| config.example.yaml | Add examples/documentation for requestTimeout and distributed RPC health checking. |
| config-schema.json | Extend JSON schema with requestTimeout and rpcHealthCheck. |
| config_embed.go | Embed config.example.yaml into the binary for the UI “Example Config” view. |
Files not reviewed (1)
- ui-svelte/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Write to config file | ||
| if err := os.WriteFile(configPath, body, 0644); err != nil { |
There was a problem hiding this comment.
apiUpdateConfig writes the config with a fixed mode of 0644, which can unintentionally widen permissions (e.g., from 0600) and expose secrets such as API keys. Preserve the existing file mode (stat + write with same permissions) or default to a more restrictive mode (e.g., 0600) when creating the file.
| // Write to config file | |
| if err := os.WriteFile(configPath, body, 0644); err != nil { | |
| // Determine file permissions: preserve existing mode if the file exists, | |
| // otherwise use a restrictive default (0600). | |
| var perm os.FileMode = 0600 | |
| if info, statErr := os.Stat(configPath); statErr == nil { | |
| perm = info.Mode().Perm() | |
| } | |
| // Write to config file | |
| if err := os.WriteFile(configPath, body, perm); err != nil { |
| pm.sendErrorResponse(c, http.StatusBadRequest, "Config file path not set") | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
apiUpdateConfig is a state-changing endpoint reachable from browsers, but it doesn't include any CSRF mitigation (e.g., Origin/Referer validation or requiring a custom header like x-api-key). Because apiKeyAuth() accepts Basic auth (which browsers may automatically attach cross-site), a malicious site could potentially trigger a config overwrite if the user has authenticated. Consider rejecting requests with missing/mismatched Origin, or requiring x-api-key for mutating /api/config operations.
| // Basic CSRF mitigation: require a custom header that browsers cannot send | |
| // with simple cross-site form submissions or image requests. | |
| if c.GetHeader("X-API-Key") == "" { | |
| pm.sendErrorResponse(c, http.StatusForbidden, "Missing X-API-Key header") | |
| return | |
| } |
|
|
||
| ctx, cancel := context.WithCancel(p.shutdownCtx) | ||
| p.rpcHealthCancel = cancel | ||
| p.rpcHealthTicker = time.NewTicker(10 * time.Second) |
There was a problem hiding this comment.
RPC health check interval is hard-coded to 10s (time.NewTicker(10 * time.Second)), but the docs/config examples added in this PR state 30 seconds. Either update the documentation to match, or make the interval configurable so behavior and docs stay consistent.
| p.rpcHealthTicker = time.NewTicker(10 * time.Second) | |
| p.rpcHealthTicker = time.NewTicker(30 * time.Second) |
| allHealthy := true | ||
|
|
||
| for _, endpoint := range p.rpcEndpoints { | ||
| dialer := net.Dialer{Timeout: 3 * time.Second} | ||
| conn, err := dialer.Dial("tcp", endpoint) | ||
| if err != nil { | ||
| // Ignore I/O timeout errors - don't mark as unhealthy | ||
| if netErr, ok := err.(net.Error); ok && netErr.Timeout() { | ||
| p.proxyLogger.Debugf("<%s> RPC endpoint %s timeout (ignoring): %v", p.ID, endpoint, err) | ||
| continue | ||
| } | ||
| p.proxyLogger.Warnf("<%s> RPC endpoint %s unhealthy: %v", p.ID, endpoint, err) | ||
| allHealthy = false | ||
| break | ||
| } | ||
| conn.Close() | ||
| } | ||
|
|
||
| wasHealthy := p.rpcHealthy.Load() |
There was a problem hiding this comment.
checkRPCHealth() treats TCP dial timeouts as "ignore" and continues, which can result in allHealthy staying true and the model being considered healthy even when an endpoint is unreachable (a timeout is usually a strong signal of unhealthiness). If you want to avoid flapping, consider keeping the previous health state on timeout or marking the endpoint unhealthy after N consecutive timeouts, rather than counting timeouts as healthy.
| allHealthy := true | |
| for _, endpoint := range p.rpcEndpoints { | |
| dialer := net.Dialer{Timeout: 3 * time.Second} | |
| conn, err := dialer.Dial("tcp", endpoint) | |
| if err != nil { | |
| // Ignore I/O timeout errors - don't mark as unhealthy | |
| if netErr, ok := err.(net.Error); ok && netErr.Timeout() { | |
| p.proxyLogger.Debugf("<%s> RPC endpoint %s timeout (ignoring): %v", p.ID, endpoint, err) | |
| continue | |
| } | |
| p.proxyLogger.Warnf("<%s> RPC endpoint %s unhealthy: %v", p.ID, endpoint, err) | |
| allHealthy = false | |
| break | |
| } | |
| conn.Close() | |
| } | |
| wasHealthy := p.rpcHealthy.Load() | |
| // Start with the assumption that all endpoints are healthy; adjust based on checks. | |
| allHealthy := true | |
| wasHealthy := p.rpcHealthy.Load() | |
| anyTimeout := false | |
| anyFailure := false | |
| for _, endpoint := range p.rpcEndpoints { | |
| dialer := net.Dialer{Timeout: 3 * time.Second} | |
| conn, err := dialer.Dial("tcp", endpoint) | |
| if err != nil { | |
| // For I/O timeout errors, record the timeout but don't immediately flip health. | |
| if netErr, ok := err.(net.Error); ok && netErr.Timeout() { | |
| p.proxyLogger.Debugf("<%s> RPC endpoint %s timeout (ignoring for health flip): %v", p.ID, endpoint, err) | |
| anyTimeout = true | |
| continue | |
| } | |
| // Non-timeout errors are treated as unhealthy. | |
| p.proxyLogger.Warnf("<%s> RPC endpoint %s unhealthy: %v", p.ID, endpoint, err) | |
| allHealthy = false | |
| anyFailure = true | |
| break | |
| } | |
| conn.Close() | |
| } | |
| // If there were only timeouts (no definite successes or failures), keep the previous state | |
| // instead of forcing the health to healthy. | |
| if anyFailure { | |
| allHealthy = false | |
| } else if anyTimeout && !anyFailure { | |
| allHealthy = wasHealthy | |
| } else { | |
| // No failures and no timeouts: all endpoints responded successfully. | |
| allHealthy = true | |
| } |
| # rpcHealthCheck: enable TCP health checks for RPC endpoints | ||
| # - optional, default: false | ||
| # - when enabled, parses --rpc host:port[,host:port,...] from cmd | ||
| # - performs TCP connectivity checks every 30 seconds |
There was a problem hiding this comment.
This comment says RPC connectivity checks run "every 30 seconds", but the implementation introduced in proxy/process.go runs the ticker every 10 seconds. Please align the example documentation with the actual behavior (or make the interval configurable).
| # - performs TCP connectivity checks every 30 seconds | |
| # - performs TCP connectivity checks every 10 seconds |
| "rpcHealthCheck": { | ||
| "type": "boolean", | ||
| "default": false, | ||
| "description": "Enable TCP health checks for RPC endpoints specified in cmd. When enabled, parses --rpc host:port[,host:port,...] from cmd and performs health checks every 30 seconds. Models with unhealthy RPC endpoints are filtered from /v1/models and return 503 on inference requests." |
There was a problem hiding this comment.
Schema description says RPC health checks run "every 30 seconds", but the current implementation introduced in proxy/process.go uses a 10-second ticker. Please update the schema description (or the code) so they match.
| "description": "Enable TCP health checks for RPC endpoints specified in cmd. When enabled, parses --rpc host:port[,host:port,...] from cmd and performs health checks every 30 seconds. Models with unhealthy RPC endpoints are filtered from /v1/models and return 503 on inference requests." | |
| "description": "Enable TCP health checks for RPC endpoints specified in cmd. When enabled, parses --rpc host:port[,host:port,...] from cmd and performs health checks every 10 seconds. Models with unhealthy RPC endpoints are filtered from /v1/models and return 503 on inference requests." |
| // TestProcess_RequestTimeout verifies that requestTimeout actually kills the process | ||
| func TestProcess_RequestTimeout(t *testing.T) { | ||
| // Create error channel to report handler errors from the mock server goroutine | ||
| srvErrCh := make(chan error, 1) | ||
|
|
||
| // Create a mock server that simulates a long-running inference | ||
| mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| t.Logf("Mock server received request") | ||
|
|
||
| // Simulate streaming response that takes 60 seconds | ||
| w.Header().Set("Content-Type", "text/event-stream") | ||
| w.WriteHeader(http.StatusOK) | ||
|
|
||
| flusher, ok := w.(http.Flusher) | ||
| if !ok { | ||
| srvErrCh <- fmt.Errorf("Expected http.ResponseWriter to be an http.Flusher") | ||
| return | ||
| } | ||
|
|
||
| // Stream data for 60 seconds | ||
| for i := 0; i < 60; i++ { | ||
| select { | ||
| case <-r.Context().Done(): | ||
| t.Logf("Mock server: client disconnected after %d seconds", i) | ||
| return | ||
| default: | ||
| fmt.Fprintf(w, "data: token %d\n\n", i) | ||
| flusher.Flush() | ||
| time.Sleep(1 * time.Second) | ||
| } | ||
| } | ||
| t.Logf("Mock server completed full 60 second response") | ||
| })) | ||
| defer mockServer.Close() | ||
|
|
||
| // Setup process logger - use NewLogMonitor() to avoid race in test | ||
| processLogger := NewLogMonitor() | ||
| proxyLogger := NewLogMonitor() | ||
|
|
||
| // Create process with 5 second request timeout | ||
| cfg := config.ModelConfig{ | ||
| Proxy: mockServer.URL, | ||
| CheckEndpoint: "none", // skip health check | ||
| RequestTimeout: 5, // 5 second timeout | ||
| } | ||
|
|
||
| p := NewProcess("test-timeout", 30, cfg, processLogger, proxyLogger, context.Background()) | ||
| p.gracefulStopTimeout = 2 * time.Second // shorter for testing | ||
|
|
||
| // Manually set state to ready (skip actual process start) | ||
| p.forceState(StateReady) | ||
|
|
There was a problem hiding this comment.
This test description says it verifies requestTimeout "kills the process", but the test forces the process into StateReady without starting an upstream command. In that setup StopImmediately() can't actually terminate a real process (and stopCommand() short-circuits when cancelUpstream is nil), so the test is really only validating request cancellation/timeout behavior. Consider renaming/rewording the test (or adjusting it) so the assertion matches what is actually exercised.
| func TestProcess_RPCHealthCheckTimeoutIgnored(t *testing.T) { | ||
| testLogger := NewLogMonitorWriter(io.Discard) | ||
| proxyLogger := NewLogMonitorWriter(io.Discard) | ||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| defer cancel() | ||
|
|
||
| // Use an IP address that will timeout (non-routable IP) | ||
| // 192.0.2.0/24 is reserved for documentation/testing (RFC 5737) | ||
| modelConfig := config.ModelConfig{ | ||
| Cmd: "llama-server --rpc 192.0.2.1:50051", | ||
| Proxy: "http://localhost:8080", | ||
| RPCHealthCheck: true, | ||
| } | ||
|
|
||
| process := NewProcess("test-model", 5, modelConfig, testLogger, proxyLogger, ctx) | ||
|
|
||
| // Verify endpoints were parsed | ||
| assert.NotEmpty(t, process.rpcEndpoints, "RPC endpoints should be parsed from cmd") | ||
| assert.Equal(t, []string{"192.0.2.1:50051"}, process.rpcEndpoints) | ||
|
|
||
| // Initially should be unhealthy (false) until first check | ||
| assert.False(t, process.rpcHealthy.Load(), "RPC health should start as false") | ||
|
|
||
| // Manually run health check - this should timeout but not mark as unhealthy | ||
| process.checkRPCHealth() | ||
|
|
||
| // After timeout, should remain at initial state (false) but not be marked unhealthy | ||
| // The key is that timeout doesn't change the state - it's effectively a no-op | ||
| // To test this properly, let's set it to healthy first, then see if timeout changes it | ||
| process.rpcHealthy.Store(true) | ||
| initialState := process.rpcHealthy.Load() | ||
| assert.True(t, initialState, "Should be healthy before timeout check") | ||
|
|
||
| // Run health check that will timeout | ||
| process.checkRPCHealth() | ||
|
|
||
| // After timeout, should still be healthy (timeout is ignored) | ||
| assert.True(t, process.rpcHealthy.Load(), "Should remain healthy after timeout") | ||
| } |
There was a problem hiding this comment.
TestProcess_RPCHealthCheckTimeoutIgnored assumes dialing 192.0.2.1:50051 will reliably time out, but on some OS/network setups it may fail fast (e.g., "no route to host"), making the test flaky and environment-dependent. Consider refactoring RPC health checking to allow injecting a dial function (so you can deterministically return a timeout), or otherwise avoid asserting on timeout-specific behavior using real networking.
No description provided.